-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Inspector reports should link to CVEs (#6557) #6562
Conversation
1a818a8
to
255666c
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #6562 +/- ##
========================================
Coverage 85.42% 85.42%
========================================
Files 155 155
Lines 20750 20750
========================================
Hits 17726 17726
Misses 3024 3024
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
scripts/export_inspector_findings.py
Outdated
findings_vuln_sorted = {vuln: findings[vuln] for vuln in sorted(findings)} | ||
for vulnerability, summaries in sorted(findings_vuln_sorted.items(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This double sorting doesn't work since the second sort jumbles up the first sort's results. Also, the current sorting is flawed in that it would sort ['CVE-2024-500', 'CVE-2024-2000', 'CVE-2024-90'] alphanumerically (2000, 500, 90) instead of numerically (90, 500, 2000).
Consider that findings_sort()
is already doing a secondary sort by returning a tuple (score, vulnerability_name). This could be modified to sort by (score, int(vulnerability_number)) to achieve a secondary sort using the numeric part of the vulnerability.
Here's a proof of concept:
Index: scripts/export_inspector_findings.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/scripts/export_inspector_findings.py b/scripts/export_inspector_findings.py
--- a/scripts/export_inspector_findings.py (revision 255666ceff1d07ea89e2943e57fe6b30e9bfdd48)
+++ b/scripts/export_inspector_findings.py (date 1726158979848)
@@ -10,6 +10,9 @@
import json
import logging
import sys
+from typing import (
+ Any,
+)
from furl import (
furl,
@@ -173,13 +176,17 @@
cols = list(chars) + [a + b for a in chars for b in chars]
return cols[col - 1]
- def findings_sort(self, item: tuple[str, list[SummaryType]]) -> tuple[int, str]:
+ def findings_sort(self, item: tuple[str, list[SummaryType]]) -> tuple[int, tuple[Any, ...]]:
score = 0
weights = {'HIGH': 1, 'CRITICAL': 10}
for summary in item[1]:
count = len(summary['resources'])
score += count * weights.get(summary['severity'], 0)
- return score, item[0]
+ name_parts = item[0].split('-')
+ if len(name_parts) == 3 and name_parts[0] == 'CVE':
+ return score, (int(name_parts[1]), int(name_parts[2]))
+ else:
+ return score, (item[0],)
def write_to_csv(self,
findings: dict[str, list[SummaryType]],
@@ -195,10 +202,11 @@
lookup = dict(zip(titles, range(len(titles))))
rows = [titles]
- findings_vuln_sorted = {vuln: findings[vuln] for vuln in sorted(findings)}
- for vulnerability, summaries in sorted(findings_vuln_sorted.items(),
+ for vulnerability, summaries in sorted(findings.items(),
key=self.findings_sort,
reverse=True):
+ # FIXME: Delete this debug print
+ print(vulnerability, [s['severity'] for s in summaries])
# A mapping of column index to abbreviated severity value
column_values = {
lookup[key]: summary['severity'][0:1]
255666c
to
3481d61
Compare
4fc72d1
to
8c8fcc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider storing the URL in the summary
dictionary instead of creating a separate vulnerability_links
dictionary.
Since the script groups findings by vulnerability, and there is one URL provided per finding, it is possible that one vulnerability will have more than one unique URL. In this case I think it makes sense to use the most common URL for a given vulnerability rather than the first (or last) URL encountered. The patch below uses this approach.
Index: scripts/export_inspector_findings.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/scripts/export_inspector_findings.py b/scripts/export_inspector_findings.py
--- a/scripts/export_inspector_findings.py (revision 8c8fcc00e7961fd3899e8943e25b9d7a7a66c3a4)
+++ b/scripts/export_inspector_findings.py (date 1726699545451)
@@ -3,6 +3,7 @@
vulnerability.
"""
from collections import (
+ Counter,
defaultdict,
)
import csv
@@ -11,10 +12,6 @@
import logging
import sys
-from furl import (
- furl,
-)
-
from azul.args import (
AzulArgumentHelpFormatter,
)
@@ -119,13 +116,11 @@
if self.args.json:
self.dump_to_json(findings)
parsed_findings = defaultdict(list)
- vulnerability_links = defaultdict(furl)
for finding in findings:
- vulnerability, source_url, summary = self.parse_finding(finding)
- vulnerability_links[vulnerability].url = source_url
+ vulnerability, summary = self.parse_finding(finding)
parsed_findings[vulnerability].append(summary)
log.info('Found %i unique vulnerabilities', len(parsed_findings))
- self.write_to_csv(parsed_findings, vulnerability_links)
+ self.write_to_csv(parsed_findings)
log.info('Done.')
def dump_to_json(self, findings: JSONs) -> None:
@@ -134,7 +129,7 @@
with open(output_file_name, 'w') as f:
json.dump({'findings': findings}, f, default=str, indent=4)
- def parse_finding(self, finding: JSON) -> tuple[str, str, SummaryType]:
+ def parse_finding(self, finding: JSON) -> tuple[str, SummaryType]:
severity = finding['severity']
# The vulnerabilityId is usually a substring of the finding title (e.g.
# "CVE-2023-44487" vs"CVE-2023-44487 - google.golang.org/grpc,
@@ -145,9 +140,9 @@
assert len(finding['resources']) == 1, finding
resource = finding['resources'][0]
resource_type = resource['type']
- source_url = finding['packageVulnerabilityDetails']['sourceUrl']
summary = {
'severity': severity,
+ 'source_url': finding['packageVulnerabilityDetails']['sourceUrl'],
'resource_type': resource_type,
'resources': set(),
}
@@ -165,7 +160,7 @@
self.instances.add(instance)
else:
assert False, resource
- return vulnerability, source_url, summary
+ return vulnerability, summary
def column_alpha(self, col: int) -> str:
assert col > 0, col
@@ -188,9 +183,7 @@
finding_name = finding_name.replace(id, padded_id)
return score, finding_name
- def write_to_csv(self,
- findings: dict[str, list[SummaryType]],
- vulnerability_links: dict[str, furl]) -> None:
+ def write_to_csv(self, findings: dict[str, list[SummaryType]]) -> None:
titles = [
'Vulnerability',
'Severity',
@@ -214,7 +207,11 @@
row_num = len(rows) + 1
col_range = f'C{row_num}:{last_col}{row_num}'
severity_formula = f'=(COUNTIF({col_range},"C")*10)+(COUNTIF({col_range},"H"))'
- url = vulnerability_links[vulnerability].url
+ urls = Counter([summary['source_url'] for summary in summaries])
+ if len(urls.keys()) > 1:
+ log.warning('More than one URL found for %s, using most common', vulnerability)
+ log.warning(dict(urls.most_common()))
+ url = urls.most_common(1)[0][0]
vulnerability_hyperlink = f'=HYPERLINK("{url}","{vulnerability}")'
row = [vulnerability_hyperlink, severity_formula]
for column_index in range(len(row), len(titles) + 1):
scripts/export_inspector_findings.py
Outdated
@@ -169,12 +176,21 @@ def column_alpha(self, col: int) -> str: | |||
def findings_sort(self, item: tuple[str, list[SummaryType]]) -> tuple[int, str]: | |||
score = 0 | |||
weights = {'HIGH': 1, 'CRITICAL': 10} | |||
for summary in item[1]: | |||
finding_name, summaries = item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finding_name, summaries = item | |
vulnerability, summaries = item |
scripts/export_inspector_findings.py
Outdated
if finding_name.startswith('CVE-'): | ||
# Best secondary-sorting effort on CVE findings, vulnerability names | ||
# not prefixed with 'CVE' may reflect an inaccurate secondary order. | ||
id = finding_name.rsplit('-', 1)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id = finding_name.rsplit('-', 1)[1] | |
id = finding_name.split('-')[-1] |
8c8fcc0
to
7c057cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
padded_id
can be inlined.
prefix, _, id = vulnerability.rpartition('-')
vulnerability = '-'.join([prefix, f'{id:0>6}'])
scripts/export_inspector_findings.py
Outdated
# Best secondary-sorting effort on CVE findings, vulnerability names | ||
# not prefixed with 'CVE' may reflect an inaccurate secondary order. | ||
id = vulnerability.split('-')[-1] | ||
padded_id = '000000'[:abs(6 - len(id))] + id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
padded_id = '000000'[:abs(6 - len(id))] + id | |
padded_id = f'{id:0>6}' |
With abs
:
>>> for id in ['1234', '12345', '123456', '1234567', '12345678']:
... print('000000'[:abs(6 - len(id))] + id)
001234
012345
123456
01234567
0012345678
With max
:
>>> for id in ['1234', '12345', '123456', '1234567', '12345678']:
... print('000000'[:max(0, 6 - len(id))] + id)
001234
012345
123456
1234567
12345678
With f-string:
>>> for id in ['1234', '12345', '123456', '1234567', '12345678']:
... print(f'{id:0>6}')
001234
012345
123456
1234567
12345678
Also, (not needed if you use the f-string approach, but) remember a string can be multiplied by a number:
>>> '0' * 5
'00000'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on the f-string approach.
Thus far I've only seen 5 digits as part of the CVE ID which comes after the year, which is why I added six 0's (one extra for good measure).
scripts/export_inspector_findings.py
Outdated
if vulnerability.startswith('CVE-'): | ||
# Best secondary-sorting effort on CVE findings, vulnerability names | ||
# not prefixed with 'CVE' may reflect an inaccurate secondary order. | ||
id = vulnerability.split('-')[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id = vulnerability.split('-')[-1] | |
prefix, _, id = vulnerability.rpartition('-') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted to my original approach, I think it's more deliberate in intend.
scripts/export_inspector_findings.py
Outdated
# not prefixed with 'CVE' may reflect an inaccurate secondary order. | ||
id = vulnerability.split('-')[-1] | ||
padded_id = '000000'[:abs(6 - len(id))] + id | ||
vulnerability = vulnerability.replace(id, padded_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vulnerability = vulnerability.replace(id, padded_id) | |
vulnerability = '-'.join([prefix, padded_id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't quite work, since it omits "year" aspect of the CVE name which also needs to be considered for the secondary sort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rpartition() splits on the first match from the right. The year isn't omitted, it's included in the first element from rpartition.
>>> "CVE-2024-123".rpartition('-')
('CVE-2024', '-', '123')
I think the variable name prefix
was throwing you off. how about:
cve_year, _, id = vulnerability.rpartition('-')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's what it was.
But I still prefer my approach.
7cdf279
to
255666c
Compare
a463745
to
90621c0
Compare
scripts/export_inspector_findings.py
Outdated
# Best secondary-sorting effort on CVE findings, vulnerability names | ||
# not prefixed with 'CVE' may reflect an inaccurate secondary order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Best secondary-sorting effort on CVE findings, vulnerability names | |
# not prefixed with 'CVE' may reflect an inaccurate secondary order. | |
# Best effort on sorting CVEs by ascending year and sequence number. Other types of findings are sorted strictly alphanumerically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts/export_inspector_findings.py
Outdated
# CVE IDs use a maximum of seven digits in the sequence number | ||
# portion of the ID, so the sequence number portion is normalized | ||
# to a maximum length of seven, for accurate alphanumerical sorting. | ||
# See https://cve.mitre.org/cve/identifiers/syntaxchange.html#new. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# CVE IDs use a maximum of seven digits in the sequence number | |
# portion of the ID, so the sequence number portion is normalized | |
# to a maximum length of seven, for accurate alphanumerical sorting. | |
# See https://cve.mitre.org/cve/identifiers/syntaxchange.html#new. | |
# The sequence number portion of CVE IDs is at most seven digits long. We pad it to that length so that, for example, a CVE with sequence number 2 precedes one with number 11. | |
# See https://cve.mitre.org/cve/identifiers/syntaxchange.html#new. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swapped 11 and 2 to reflect reality, as shown in the example above for line 49 and 50.
scripts/export_inspector_findings.py
Outdated
if len(urls.keys()) > 1: | ||
log.warning('More than one URL found for %s, using most common', vulnerability) | ||
log.warning(dict(urls.most_common())) | ||
url = urls.most_common(1)[0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(urls.keys()) > 1: | |
log.warning('More than one URL found for %s, using most common', vulnerability) | |
log.warning(dict(urls.most_common())) | |
url = urls.most_common(1)[0][0] | |
url, frequency = one(urls.most_common(1)) # REVIEW: my version is stricter and more readable | |
if len(urls) > 1: | |
log.debug('URLs by by frequency: %r', urls.most_common()) | |
log.warning('More than one URL found for %s, using the most common one (%r)', vulnerability, url) |
Are you sure each URL can be listed more than once? If so, please explain your reasoning behind this heuristic. If not, please simplify the above.
scripts/export_inspector_findings.py
Outdated
log.warning(dict(urls.most_common())) | ||
url = urls.most_common(1)[0][0] | ||
vulnerability_hyperlink = f'=HYPERLINK("{url}","{vulnerability}")' | ||
row = [vulnerability_hyperlink, severity_formula] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
row = [vulnerability_hyperlink, severity_formula] | |
row = [hyperlink, severity_formula] |
90621c0
to
cae8f71
Compare
cae8f71
to
d37970c
Compare
Security design review
|
d37970c
to
3332156
Compare
3332156
to
420b641
Compare
Connected issues: #6557
Checklist
Author
develop
issues/<GitHub handle of author>/<issue#>-<slug>
1 when the issue title describes a problem, the corresponding PR
title is
Fix:
followed by the issue titleAuthor (partiality)
p
tag to titles of partial commitspartial
or completely resolves all connected issuespartial
labelAuthor (chains)
base
or this PR is not chained to another PRchained
or is not chained to another PRAuthor (reindex, API changes)
r
tag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:dev
or the changes introduced by it will not require reindexing ofdev
reindex:anvildev
or the changes introduced by it will not require reindexing ofanvildev
reindex:anvilprod
or the changes introduced by it will not require reindexing ofanvilprod
reindex:prod
or the changes introduced by it will not require reindexing ofprod
reindex:partial
and its description documents the specific reindexing procedure fordev
,anvildev
,anvilprod
andprod
or requires a full reindex or carries none of the labelsreindex:dev
,reindex:anvildev
,reindex:anvilprod
andreindex:prod
API
or this PR does not modify a REST APIa
(A
) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.py
or this PR does not modify a REST APIAuthor (upgrading deployments)
make docker_images.json
and committed the resulting changes or this PR does not modifyazul_docker_images
, or any other variables referenced in the definition of that variableu
tag to commit title or this PR does not require upgrading deploymentsupgrade
or does not require upgrading deploymentsdeploy:shared
or does not modifydocker_images.json
, and does not require deploying theshared
component for any other reasondeploy:gitlab
or does not require deploying thegitlab
componentdeploy:runner
or does not require deploying therunner
imageAuthor (hotfixes)
F
tag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprod
andprod
) have temporary hotfixes for any of the issues connected to this PRAuthor (before every review)
develop
, squashed old fixupsmake requirements_update
or this PR does not modifyrequirements*.txt
,common.mk
,Makefile
andDockerfile
R
tag to commit title or this PR does not modifyrequirements*.txt
reqs
or does not modifyrequirements*.txt
make integration_test
passes in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
System administrator (after approval)
demo
orno demo
no demo
no sandbox
N reviews
label is accurateOperator (before pushing merge the commit)
reindex:…
labels andr
commit title tagno demo
develop
_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unused
or this PR is not labeleddeploy:shared
_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab apply
or this PR is not labeleddeploy:gitlab
deploy:gitlab
deploy:gitlab
System administrator
dev.gitlab
are complete or this PR is not labeleddeploy:gitlab
anvildev.gitlab
are complete or this PR is not labeleddeploy:gitlab
Operator (before pushing merge the commit)
_select dev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
_select anvildev.gitlab && make -C terraform/gitlab/runner
or this PR is not labeleddeploy:runner
sandbox
label or PR is labeledno sandbox
dev
or PR is labeledno sandbox
anvildev
or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
deployment or PR is labeledno sandbox
anvilbox
deployment or PR is labeledno sandbox
sandbox
or this PR does not remove catalogs or otherwise causes unreferenced indices indev
anvilbox
or this PR does not remove catalogs or otherwise causes unreferenced indices inanvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
sandbox
or this PR is not labeledreindex:dev
anvilbox
or this PR is not labeledreindex:anvildev
p
if the PR is also labeledpartial
Operator (chain shortening)
develop
or this PR is not labeledbase
chained
label from the blocked PR or this PR is not labeledbase
base
base
label from this PR or this PR is not labeledbase
Operator (after pushing the merge commit)
dev
anvildev
dev
dev
anvildev
anvildev
_select dev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
_select anvildev.shared && make -C terraform/shared apply
or this PR is not labeleddeploy:shared
dev
anvildev
Operator (reindex)
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR is neither labeledreindex:partial
norreindex:dev
anvildev
or this PR is neither labeledreindex:partial
norreindex:anvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
dev
or this PR does not require reindexingdev
anvildev
or this PR does not require reindexinganvildev
Operator
deploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels to the next promotion PRs or this PR carries none of these labelsdeploy:shared
,deploy:gitlab
,deploy:runner
,API
,reindex:partial
,reindex:anvilprod
andreindex:prod
labels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
L
line is too longW
line wrapping is wrongQ
bad quotesF
other formatting problem